Approximate dynamic programming via direct search in the space of value function approximations
نویسندگان
چکیده
This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span semi-norm of the Bellman residual in a convex value function approximation space. The novelty here is that the optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to local optimality are derived. The procedure employs the classical AVI algorithm direction (Bellman residual) combined with a set of independent search directions, to improve the convergence rate. It has guaranteed convergence and satisfies, at least, the necessary optimality conditions over a prescribed set of directions. To illustrate the method, examples are presented that deal with a class of problems from the literature and a large state space queueing problem setting. 2010 Elsevier B.V. All rights reserved.
منابع مشابه
Tuning Approximate Dynamic Programming Policies for Ambulance Redeployment via Direct Search
In this paper we consider approximate dynamic programming methods for ambulance redeployment. We first demonstrate through simple examples how typical value function fitting techniques, such as approximate policy iteration and linear programming, may not be able to locate a high-quality policy even when the value function approximation architecture is rich enough to provide the optimal policy. ...
متن کاملApproximate Linear Programming for Average-Cost Dynamic Programming
This paper extends our earlier analysis on approximate linear programming as an approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive versi...
متن کاملModify the linear search formula in the BFGS method to achieve global convergence.
<span style="color: #333333; font-family: Calibri, sans-serif; font-size: 13.3333px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff; text-dec...
متن کاملSequential Bayesian optimal experimental design via approximate dynamic programming
The design of multiple experiments is commonly undertaken via suboptimal strategies, such as batch (open-loop) design that omits feedback or greedy (myopic) design that does not account for future effects. This paper introduces new strategies for the optimal design of sequential experiments. First, we rigorously formulate the general sequential optimal experimental design (sOED) problem as a dy...
متن کاملApproximate dynamic programming with Bᅢᄅzier Curves/Surfaces for Top-percentile Traffic Routing
Multi-homing is used by Internet Service Providers (ISPs) to connect to the Internet via different network providers. This study develops a routing strategy under multi-homing in the case where network providers charge ISPs according to top-percentile pricing (i.e. based on the hth highest volume of traffic shipped). We call this problem the Top-percentile Traffic Routing Problem (TpTRP). Solut...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- European Journal of Operational Research
دوره 211 شماره
صفحات -
تاریخ انتشار 2011